Overview

Dataset statistics

Number of variables17
Number of observations1003
Missing cells149
Missing cells (%)0.9%
Duplicate rows3
Duplicate rows (%)0.3%
Total size in memory133.3 KiB
Average record size in memory136.1 B

Variable types

Categorical10
Numeric7

Alerts

gross margin percentage has constant value "4.761904762" Constant
Dataset has 3 (0.3%) duplicate rowsDuplicates
Invoice ID has a high cardinality: 1000 distinct values High cardinality
Date has a high cardinality: 89 distinct values High cardinality
Time has a high cardinality: 506 distinct values High cardinality
Unit price is highly correlated with Tax 5% and 3 other fieldsHigh correlation
Quantity is highly correlated with Tax 5% and 3 other fieldsHigh correlation
Tax 5% is highly correlated with Unit price and 4 other fieldsHigh correlation
Total is highly correlated with Unit price and 4 other fieldsHigh correlation
cogs is highly correlated with Unit price and 4 other fieldsHigh correlation
gross income is highly correlated with Unit price and 4 other fieldsHigh correlation
Payment is highly correlated with gross margin percentageHigh correlation
City is highly correlated with BranchHigh correlation
gross margin percentage is highly correlated with Payment and 6 other fieldsHigh correlation
Branch is highly correlated with CityHigh correlation
Gender is highly correlated with gross margin percentageHigh correlation
Date is highly correlated with gross margin percentageHigh correlation
Product line is highly correlated with gross margin percentageHigh correlation
Customer type is highly correlated with gross margin percentageHigh correlation
Customer type has 79 (7.9%) missing values Missing
Product line has 43 (4.3%) missing values Missing
Quantity has 20 (2.0%) missing values Missing
Invoice ID is uniformly distributed Uniform
Time is uniformly distributed Uniform

Reproduction

Analysis started2022-10-18 20:50:47.407437
Analysis finished2022-10-18 20:50:54.381403
Duration6.97 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

Invoice ID
Categorical

HIGH CARDINALITY
UNIFORM

Distinct1000
Distinct (%)99.7%
Missing0
Missing (%)0.0%
Memory size8.0 KiB
849-09-3807
 
2
452-04-8808
 
2
745-74-0715
 
2
491-38-3499
 
1
322-02-2271
 
1
Other values (995)
995 

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters11033
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique997 ?
Unique (%)99.4%

Sample

1st row750-67-8428
2nd row226-31-3081
3rd row631-41-3108
4th row123-19-1176
5th row373-73-7910

Common Values

ValueCountFrequency (%)
849-09-38072
 
0.2%
452-04-88082
 
0.2%
745-74-07152
 
0.2%
491-38-34991
 
0.1%
322-02-22711
 
0.1%
816-72-88531
 
0.1%
842-29-46951
 
0.1%
725-67-24801
 
0.1%
642-61-47061
 
0.1%
641-51-26611
 
0.1%
Other values (990)990
98.7%

Length

2022-10-19T03:50:54.413089image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
849-09-38072
 
0.2%
745-74-07152
 
0.2%
452-04-88082
 
0.2%
433-75-69871
 
0.1%
252-56-26991
 
0.1%
871-79-84831
 
0.1%
848-62-72431
 
0.1%
631-41-31081
 
0.1%
123-19-11761
 
0.1%
373-73-79101
 
0.1%
Other values (990)990
98.7%

Most occurring characters

ValueCountFrequency (%)
-2006
18.2%
2958
8.7%
6954
8.6%
1951
8.6%
8949
8.6%
5930
8.4%
4923
8.4%
3910
8.2%
7899
8.1%
0814
7.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number9027
81.8%
Dash Punctuation2006
 
18.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2958
10.6%
6954
10.6%
1951
10.5%
8949
10.5%
5930
10.3%
4923
10.2%
3910
10.1%
7899
10.0%
0814
9.0%
9739
8.2%
Dash Punctuation
ValueCountFrequency (%)
-2006
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common11033
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
-2006
18.2%
2958
8.7%
6954
8.6%
1951
8.6%
8949
8.6%
5930
8.4%
4923
8.4%
3910
8.2%
7899
8.1%
0814
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII11033
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
-2006
18.2%
2958
8.7%
6954
8.6%
1951
8.6%
8949
8.6%
5930
8.4%
4923
8.4%
3910
8.2%
7899
8.1%
0814
7.4%

Branch
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size8.0 KiB
A
342 
B
333 
C
328 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1003
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowC
3rd rowA
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A342
34.1%
B333
33.2%
C328
32.7%

Length

2022-10-19T03:50:54.498269image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-19T03:50:54.598506image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
a342
34.1%
b333
33.2%
c328
32.7%

Most occurring characters

ValueCountFrequency (%)
A342
34.1%
B333
33.2%
C328
32.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1003
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A342
34.1%
B333
33.2%
C328
32.7%

Most occurring scripts

ValueCountFrequency (%)
Latin1003
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A342
34.1%
B333
33.2%
C328
32.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1003
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A342
34.1%
B333
33.2%
C328
32.7%

City
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size8.0 KiB
Yangon
342 
Mandalay
333 
Naypyitaw
328 

Length

Max length9
Median length8
Mean length7.645064806
Min length6

Characters and Unicode

Total characters7668
Distinct characters14
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYangon
2nd rowNaypyitaw
3rd rowYangon
4th rowYangon
5th rowYangon

Common Values

ValueCountFrequency (%)
Yangon342
34.1%
Mandalay333
33.2%
Naypyitaw328
32.7%

Length

2022-10-19T03:50:54.679696image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-19T03:50:54.758295image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
yangon342
34.1%
mandalay333
33.2%
naypyitaw328
32.7%

Most occurring characters

ValueCountFrequency (%)
a1997
26.0%
n1017
13.3%
y989
12.9%
Y342
 
4.5%
g342
 
4.5%
o342
 
4.5%
M333
 
4.3%
d333
 
4.3%
l333
 
4.3%
N328
 
4.3%
Other values (4)1312
17.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6665
86.9%
Uppercase Letter1003
 
13.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a1997
30.0%
n1017
15.3%
y989
14.8%
g342
 
5.1%
o342
 
5.1%
d333
 
5.0%
l333
 
5.0%
p328
 
4.9%
i328
 
4.9%
t328
 
4.9%
Uppercase Letter
ValueCountFrequency (%)
Y342
34.1%
M333
33.2%
N328
32.7%

Most occurring scripts

ValueCountFrequency (%)
Latin7668
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a1997
26.0%
n1017
13.3%
y989
12.9%
Y342
 
4.5%
g342
 
4.5%
o342
 
4.5%
M333
 
4.3%
d333
 
4.3%
l333
 
4.3%
N328
 
4.3%
Other values (4)1312
17.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII7668
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a1997
26.0%
n1017
13.3%
y989
12.9%
Y342
 
4.5%
g342
 
4.5%
o342
 
4.5%
M333
 
4.3%
d333
 
4.3%
l333
 
4.3%
N328
 
4.3%
Other values (4)1312
17.1%

Customer type
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)0.2%
Missing79
Missing (%)7.9%
Memory size8.0 KiB
Normal
470 
Member
454 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters5544
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMember
2nd rowNormal
3rd rowNormal
4th rowMember
5th rowNormal

Common Values

ValueCountFrequency (%)
Normal470
46.9%
Member454
45.3%
(Missing)79
 
7.9%

Length

2022-10-19T03:50:54.836902image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-19T03:50:55.009182image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
normal470
50.9%
member454
49.1%

Most occurring characters

ValueCountFrequency (%)
r924
16.7%
m924
16.7%
e908
16.4%
N470
8.5%
o470
8.5%
a470
8.5%
l470
8.5%
M454
8.2%
b454
8.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4620
83.3%
Uppercase Letter924
 
16.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r924
20.0%
m924
20.0%
e908
19.7%
o470
10.2%
a470
10.2%
l470
10.2%
b454
9.8%
Uppercase Letter
ValueCountFrequency (%)
N470
50.9%
M454
49.1%

Most occurring scripts

ValueCountFrequency (%)
Latin5544
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r924
16.7%
m924
16.7%
e908
16.4%
N470
8.5%
o470
8.5%
a470
8.5%
l470
8.5%
M454
8.2%
b454
8.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII5544
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r924
16.7%
m924
16.7%
e908
16.4%
N470
8.5%
o470
8.5%
a470
8.5%
l470
8.5%
M454
8.2%
b454
8.2%

Gender
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.0 KiB
Female
502 
Male
501 

Length

Max length6
Median length6
Mean length5.000997009
Min length4

Characters and Unicode

Total characters5016
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowFemale
3rd rowMale
4th rowMale
5th rowMale

Common Values

ValueCountFrequency (%)
Female502
50.0%
Male501
50.0%

Length

2022-10-19T03:50:55.087325image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-19T03:50:55.181548image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
female502
50.0%
male501
50.0%

Most occurring characters

ValueCountFrequency (%)
e1505
30.0%
a1003
20.0%
l1003
20.0%
F502
 
10.0%
m502
 
10.0%
M501
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4013
80.0%
Uppercase Letter1003
 
20.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1505
37.5%
a1003
25.0%
l1003
25.0%
m502
 
12.5%
Uppercase Letter
ValueCountFrequency (%)
F502
50.0%
M501
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5016
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1505
30.0%
a1003
20.0%
l1003
20.0%
F502
 
10.0%
m502
 
10.0%
M501
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII5016
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1505
30.0%
a1003
20.0%
l1003
20.0%
F502
 
10.0%
m502
 
10.0%
M501
 
10.0%

Product line
Categorical

HIGH CORRELATION
MISSING

Distinct6
Distinct (%)0.6%
Missing43
Missing (%)4.3%
Memory size8.0 KiB
Fashion accessories
172 
Electronic accessories
165 
Food and beverages
165 
Sports and travel
163 
Home and lifestyle
151 

Length

Max length22
Median length19
Mean length18.546875
Min length17

Characters and Unicode

Total characters17805
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHealth and beauty
2nd rowElectronic accessories
3rd rowHome and lifestyle
4th rowHealth and beauty
5th rowSports and travel

Common Values

ValueCountFrequency (%)
Fashion accessories172
17.1%
Electronic accessories165
16.5%
Food and beverages165
16.5%
Sports and travel163
16.3%
Home and lifestyle151
15.1%
Health and beauty144
14.4%
(Missing)43
 
4.3%

Length

2022-10-19T03:50:55.244545image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-19T03:50:55.338761image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
and623
24.5%
accessories337
13.3%
fashion172
 
6.8%
electronic165
 
6.5%
food165
 
6.5%
beverages165
 
6.5%
sports163
 
6.4%
travel163
 
6.4%
home151
 
5.9%
lifestyle151
 
5.9%
Other values (2)288
11.3%

Most occurring characters

ValueCountFrequency (%)
e2238
12.6%
a1748
 
9.8%
s1662
 
9.3%
1583
 
8.9%
o1318
 
7.4%
c1004
 
5.6%
r993
 
5.6%
n960
 
5.4%
t930
 
5.2%
i825
 
4.6%
Other values (15)4544
25.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter15262
85.7%
Space Separator1583
 
8.9%
Uppercase Letter960
 
5.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2238
14.7%
a1748
11.5%
s1662
10.9%
o1318
8.6%
c1004
 
6.6%
r993
 
6.5%
n960
 
6.3%
t930
 
6.1%
i825
 
5.4%
d788
 
5.2%
Other values (10)2796
18.3%
Uppercase Letter
ValueCountFrequency (%)
F337
35.1%
H295
30.7%
E165
17.2%
S163
17.0%
Space Separator
ValueCountFrequency (%)
1583
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin16222
91.1%
Common1583
 
8.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2238
13.8%
a1748
10.8%
s1662
10.2%
o1318
 
8.1%
c1004
 
6.2%
r993
 
6.1%
n960
 
5.9%
t930
 
5.7%
i825
 
5.1%
d788
 
4.9%
Other values (14)3756
23.2%
Common
ValueCountFrequency (%)
1583
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII17805
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e2238
12.6%
a1748
 
9.8%
s1662
 
9.3%
1583
 
8.9%
o1318
 
7.4%
c1004
 
5.6%
r993
 
5.6%
n960
 
5.4%
t930
 
5.2%
i825
 
4.6%
Other values (15)4544
25.5%

Unit price
Real number (ℝ≥0)

HIGH CORRELATION

Distinct938
Distinct (%)94.2%
Missing7
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean55.76456827
Minimum10.08
Maximum99.96
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.0 KiB
2022-10-19T03:50:55.464246image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum10.08
5-th percentile15.275
Q133.125
median55.42
Q378.085
95-th percentile97.2125
Maximum99.96
Range89.88
Interquartile range (IQR)44.96

Descriptive statistics

Standard deviation26.51016538
Coefficient of variation (CV)0.4753944342
Kurtosis-1.222670132
Mean55.76456827
Median Absolute Deviation (MAD)22.575
Skewness0.0001753484848
Sum55541.51
Variance702.7888687
MonotonicityNot monotonic
2022-10-19T03:50:55.574089image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
83.773
 
0.3%
88.342
 
0.2%
60.32
 
0.2%
32.322
 
0.2%
32.252
 
0.2%
99.962
 
0.2%
45.582
 
0.2%
39.752
 
0.2%
68.712
 
0.2%
45.382
 
0.2%
Other values (928)975
97.2%
(Missing)7
 
0.7%
ValueCountFrequency (%)
10.081
0.1%
10.131
0.1%
10.161
0.1%
10.171
0.1%
10.181
0.1%
10.531
0.1%
10.561
0.1%
10.591
0.1%
10.691
0.1%
10.751
0.1%
ValueCountFrequency (%)
99.962
0.2%
99.921
0.1%
99.891
0.1%
99.831
0.1%
99.822
0.2%
99.791
0.1%
99.781
0.1%
99.731
0.1%
99.711
0.1%
99.71
0.1%

Quantity
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct10
Distinct (%)1.0%
Missing20
Missing (%)2.0%
Infinite0
Infinite (%)0.0%
Mean5.501525941
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.0 KiB
2022-10-19T03:50:55.652657image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q38
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.924673413
Coefficient of variation (CV)0.5316113101
Kurtosis-1.216926049
Mean5.501525941
Median Absolute Deviation (MAD)2
Skewness0.01667945411
Sum5408
Variance8.553714573
MonotonicityNot monotonic
2022-10-19T03:50:55.731292image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
10116
11.6%
1111
11.1%
4108
10.8%
7100
10.0%
5100
10.0%
696
9.6%
992
9.2%
289
8.9%
389
8.9%
882
8.2%
(Missing)20
 
2.0%
ValueCountFrequency (%)
1111
11.1%
289
8.9%
389
8.9%
4108
10.8%
5100
10.0%
696
9.6%
7100
10.0%
882
8.2%
992
9.2%
10116
11.6%
ValueCountFrequency (%)
10116
11.6%
992
9.2%
882
8.2%
7100
10.0%
696
9.6%
5100
10.0%
4108
10.8%
389
8.9%
289
8.9%
1111
11.1%

Tax 5%
Real number (ℝ≥0)

HIGH CORRELATION

Distinct990
Distinct (%)98.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.4003679
Minimum0.5085
Maximum49.65
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.0 KiB
2022-10-19T03:50:55.825423image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0.5085
5-th percentile1.9575
Q15.89475
median12.096
Q322.5395
95-th percentile39.146
Maximum49.65
Range49.1415
Interquartile range (IQR)16.64475

Descriptive statistics

Standard deviation11.71519182
Coefficient of variation (CV)0.760708569
Kurtosis-0.09709035236
Mean15.4003679
Median Absolute Deviation (MAD)7.518
Skewness0.8869824091
Sum15446.569
Variance137.2457195
MonotonicityNot monotonic
2022-10-19T03:50:55.935287image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30.9192
 
0.2%
9.00452
 
0.2%
4.4642
 
0.2%
10.36352
 
0.2%
13.1882
 
0.2%
8.3772
 
0.2%
39.482
 
0.2%
5.8032
 
0.2%
10.3262
 
0.2%
12.572
 
0.2%
Other values (980)983
98.0%
ValueCountFrequency (%)
0.50851
0.1%
0.60451
0.1%
0.6271
0.1%
0.6391
0.1%
0.6991
0.1%
0.7671
0.1%
0.77151
0.1%
0.7751
0.1%
0.8141
0.1%
0.88751
0.1%
ValueCountFrequency (%)
49.651
0.1%
49.491
0.1%
49.261
0.1%
48.751
0.1%
48.691
0.1%
48.6851
0.1%
48.6051
0.1%
47.791
0.1%
47.721
0.1%
45.3251
0.1%

Total
Real number (ℝ≥0)

HIGH CORRELATION

Distinct990
Distinct (%)98.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean323.4077258
Minimum10.6785
Maximum1042.65
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.0 KiB
2022-10-19T03:50:56.045156image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum10.6785
5-th percentile41.1075
Q1123.78975
median254.016
Q3473.3295
95-th percentile822.066
Maximum1042.65
Range1031.9715
Interquartile range (IQR)349.53975

Descriptive statistics

Standard deviation246.0190283
Coefficient of variation (CV)0.760708569
Kurtosis-0.09709035236
Mean323.4077258
Median Absolute Deviation (MAD)157.878
Skewness0.8869824091
Sum324377.949
Variance60525.36229
MonotonicityNot monotonic
2022-10-19T03:50:56.155032image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
649.2992
 
0.2%
189.09452
 
0.2%
93.7442
 
0.2%
217.63352
 
0.2%
276.9482
 
0.2%
175.9172
 
0.2%
829.082
 
0.2%
121.8632
 
0.2%
216.8462
 
0.2%
263.972
 
0.2%
Other values (980)983
98.0%
ValueCountFrequency (%)
10.67851
0.1%
12.69451
0.1%
13.1671
0.1%
13.4191
0.1%
14.6791
0.1%
16.1071
0.1%
16.20151
0.1%
16.2751
0.1%
17.0941
0.1%
18.63751
0.1%
ValueCountFrequency (%)
1042.651
0.1%
1039.291
0.1%
1034.461
0.1%
1023.751
0.1%
1022.491
0.1%
1022.3851
0.1%
1020.7051
0.1%
1003.591
0.1%
1002.121
0.1%
951.8251
0.1%

Date
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct89
Distinct (%)8.9%
Missing0
Missing (%)0.0%
Memory size8.0 KiB
2/7/19
 
20
2/15/19
 
19
1/26/19
 
18
3/2/19
 
18
1/8/19
 
18
Other values (84)
910 

Length

Max length7
Median length7
Mean length6.677966102
Min length6

Characters and Unicode

Total characters6698
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1/5/19
2nd row3/8/19
3rd row3/3/19
4th row1/27/19
5th row2/8/19

Common Values

ValueCountFrequency (%)
2/7/1920
 
2.0%
2/15/1919
 
1.9%
1/26/1918
 
1.8%
3/2/1918
 
1.8%
1/8/1918
 
1.8%
3/14/1918
 
1.8%
1/25/1917
 
1.7%
3/5/1917
 
1.7%
1/23/1917
 
1.7%
3/9/1916
 
1.6%
Other values (79)825
82.3%

Length

2022-10-19T03:50:56.264867image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2/7/1920
 
2.0%
2/15/1919
 
1.9%
1/26/1918
 
1.8%
3/2/1918
 
1.8%
1/8/1918
 
1.8%
3/14/1918
 
1.8%
1/25/1917
 
1.7%
3/5/1917
 
1.7%
1/23/1917
 
1.7%
3/19/1916
 
1.6%
Other values (79)825
82.3%

Most occurring characters

ValueCountFrequency (%)
/2006
29.9%
11769
26.4%
91101
16.4%
2725
 
10.8%
3480
 
7.2%
5127
 
1.9%
7106
 
1.6%
4101
 
1.5%
6100
 
1.5%
895
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4692
70.1%
Other Punctuation2006
29.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
11769
37.7%
91101
23.5%
2725
15.5%
3480
 
10.2%
5127
 
2.7%
7106
 
2.3%
4101
 
2.2%
6100
 
2.1%
895
 
2.0%
088
 
1.9%
Other Punctuation
ValueCountFrequency (%)
/2006
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common6698
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
/2006
29.9%
11769
26.4%
91101
16.4%
2725
 
10.8%
3480
 
7.2%
5127
 
1.9%
7106
 
1.6%
4101
 
1.5%
6100
 
1.5%
895
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII6698
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/2006
29.9%
11769
26.4%
91101
16.4%
2725
 
10.8%
3480
 
7.2%
5127
 
1.9%
7106
 
1.6%
4101
 
1.5%
6100
 
1.5%
895
 
1.4%

Time
Categorical

HIGH CARDINALITY
UNIFORM

Distinct506
Distinct (%)50.4%
Missing0
Missing (%)0.0%
Memory size8.0 KiB
19:48
 
7
14:42
 
7
17:38
 
6
17:16
 
5
10:11
 
5
Other values (501)
973 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters5015
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique209 ?
Unique (%)20.8%

Sample

1st row13:08
2nd row10:29
3rd row13:23
4th row20:33
5th row10:37

Common Values

ValueCountFrequency (%)
19:487
 
0.7%
14:427
 
0.7%
17:386
 
0.6%
17:165
 
0.5%
10:115
 
0.5%
17:365
 
0.5%
13:585
 
0.5%
19:205
 
0.5%
11:405
 
0.5%
19:445
 
0.5%
Other values (496)948
94.5%

Length

2022-10-19T03:50:56.343452image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
19:487
 
0.7%
14:427
 
0.7%
17:386
 
0.6%
11:405
 
0.5%
13:485
 
0.5%
19:305
 
0.5%
11:515
 
0.5%
19:445
 
0.5%
19:395
 
0.5%
19:205
 
0.5%
Other values (496)948
94.5%

Most occurring characters

ValueCountFrequency (%)
11253
25.0%
:1003
20.0%
2443
 
8.8%
0438
 
8.7%
3379
 
7.6%
4377
 
7.5%
5355
 
7.1%
8217
 
4.3%
9200
 
4.0%
6185
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4012
80.0%
Other Punctuation1003
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
11253
31.2%
2443
 
11.0%
0438
 
10.9%
3379
 
9.4%
4377
 
9.4%
5355
 
8.8%
8217
 
5.4%
9200
 
5.0%
6185
 
4.6%
7165
 
4.1%
Other Punctuation
ValueCountFrequency (%)
:1003
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common5015
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
11253
25.0%
:1003
20.0%
2443
 
8.8%
0438
 
8.7%
3379
 
7.6%
4377
 
7.5%
5355
 
7.1%
8217
 
4.3%
9200
 
4.0%
6185
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII5015
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11253
25.0%
:1003
20.0%
2443
 
8.8%
0438
 
8.7%
3379
 
7.6%
4377
 
7.5%
5355
 
7.1%
8217
 
4.3%
9200
 
4.0%
6185
 
3.7%

Payment
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size8.0 KiB
Ewallet
346 
Cash
346 
Credit card
311 

Length

Max length11
Median length7
Mean length7.205383848
Min length4

Characters and Unicode

Total characters7227
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEwallet
2nd rowCash
3rd rowCredit card
4th rowEwallet
5th rowEwallet

Common Values

ValueCountFrequency (%)
Ewallet346
34.5%
Cash346
34.5%
Credit card311
31.0%

Length

2022-10-19T03:50:56.422067image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-19T03:50:56.516263image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
ewallet346
26.3%
cash346
26.3%
credit311
23.7%
card311
23.7%

Most occurring characters

ValueCountFrequency (%)
a1003
13.9%
l692
9.6%
e657
9.1%
t657
9.1%
C657
9.1%
r622
8.6%
d622
8.6%
E346
 
4.8%
w346
 
4.8%
s346
 
4.8%
Other values (4)1279
17.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5913
81.8%
Uppercase Letter1003
 
13.9%
Space Separator311
 
4.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a1003
17.0%
l692
11.7%
e657
11.1%
t657
11.1%
r622
10.5%
d622
10.5%
w346
 
5.9%
s346
 
5.9%
h346
 
5.9%
i311
 
5.3%
Uppercase Letter
ValueCountFrequency (%)
C657
65.5%
E346
34.5%
Space Separator
ValueCountFrequency (%)
311
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6916
95.7%
Common311
 
4.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a1003
14.5%
l692
10.0%
e657
9.5%
t657
9.5%
C657
9.5%
r622
9.0%
d622
9.0%
E346
 
5.0%
w346
 
5.0%
s346
 
5.0%
Other values (3)968
14.0%
Common
ValueCountFrequency (%)
311
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII7227
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a1003
13.9%
l692
9.6%
e657
9.1%
t657
9.1%
C657
9.1%
r622
8.6%
d622
8.6%
E346
 
4.8%
w346
 
4.8%
s346
 
4.8%
Other values (4)1279
17.7%

cogs
Real number (ℝ≥0)

HIGH CORRELATION

Distinct990
Distinct (%)98.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean308.0073579
Minimum10.17
Maximum993
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.0 KiB
2022-10-19T03:50:56.610516image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum10.17
5-th percentile39.15
Q1117.895
median241.92
Q3450.79
95-th percentile782.92
Maximum993
Range982.83
Interquartile range (IQR)332.895

Descriptive statistics

Standard deviation234.3038365
Coefficient of variation (CV)0.760708569
Kurtosis-0.09709035236
Mean308.0073579
Median Absolute Deviation (MAD)150.36
Skewness0.8869824091
Sum308931.38
Variance54898.28779
MonotonicityNot monotonic
2022-10-19T03:50:56.814414image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
618.382
 
0.2%
180.092
 
0.2%
89.282
 
0.2%
207.272
 
0.2%
263.762
 
0.2%
167.542
 
0.2%
789.62
 
0.2%
116.062
 
0.2%
206.522
 
0.2%
251.42
 
0.2%
Other values (980)983
98.0%
ValueCountFrequency (%)
10.171
0.1%
12.091
0.1%
12.541
0.1%
12.781
0.1%
13.981
0.1%
15.341
0.1%
15.431
0.1%
15.51
0.1%
16.281
0.1%
17.751
0.1%
ValueCountFrequency (%)
9931
0.1%
989.81
0.1%
985.21
0.1%
9751
0.1%
973.81
0.1%
973.71
0.1%
972.11
0.1%
955.81
0.1%
954.41
0.1%
906.51
0.1%

gross margin percentage
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size8.0 KiB
4.761904762
1003 

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters11033
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4.761904762
2nd row4.761904762
3rd row4.761904762
4th row4.761904762
5th row4.761904762

Common Values

ValueCountFrequency (%)
4.7619047621003
100.0%

Length

2022-10-19T03:50:56.921558image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-19T03:50:56.986947image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
4.7619047621003
100.0%

Most occurring characters

ValueCountFrequency (%)
42006
18.2%
72006
18.2%
62006
18.2%
.1003
9.1%
11003
9.1%
91003
9.1%
01003
9.1%
21003
9.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number10030
90.9%
Other Punctuation1003
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
42006
20.0%
72006
20.0%
62006
20.0%
11003
10.0%
91003
10.0%
01003
10.0%
21003
10.0%
Other Punctuation
ValueCountFrequency (%)
.1003
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common11033
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
42006
18.2%
72006
18.2%
62006
18.2%
.1003
9.1%
11003
9.1%
91003
9.1%
01003
9.1%
21003
9.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII11033
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
42006
18.2%
72006
18.2%
62006
18.2%
.1003
9.1%
11003
9.1%
91003
9.1%
01003
9.1%
21003
9.1%

gross income
Real number (ℝ≥0)

HIGH CORRELATION

Distinct990
Distinct (%)98.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.4003679
Minimum0.5085
Maximum49.65
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.0 KiB
2022-10-19T03:50:57.065587image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0.5085
5-th percentile1.9575
Q15.89475
median12.096
Q322.5395
95-th percentile39.146
Maximum49.65
Range49.1415
Interquartile range (IQR)16.64475

Descriptive statistics

Standard deviation11.71519182
Coefficient of variation (CV)0.760708569
Kurtosis-0.09709035236
Mean15.4003679
Median Absolute Deviation (MAD)7.518
Skewness0.8869824091
Sum15446.569
Variance137.2457195
MonotonicityNot monotonic
2022-10-19T03:50:57.175400image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30.9192
 
0.2%
9.00452
 
0.2%
4.4642
 
0.2%
10.36352
 
0.2%
13.1882
 
0.2%
8.3772
 
0.2%
39.482
 
0.2%
5.8032
 
0.2%
10.3262
 
0.2%
12.572
 
0.2%
Other values (980)983
98.0%
ValueCountFrequency (%)
0.50851
0.1%
0.60451
0.1%
0.6271
0.1%
0.6391
0.1%
0.6991
0.1%
0.7671
0.1%
0.77151
0.1%
0.7751
0.1%
0.8141
0.1%
0.88751
0.1%
ValueCountFrequency (%)
49.651
0.1%
49.491
0.1%
49.261
0.1%
48.751
0.1%
48.691
0.1%
48.6851
0.1%
48.6051
0.1%
47.791
0.1%
47.721
0.1%
45.3251
0.1%

Rating
Real number (ℝ≥0)

Distinct61
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.972681954
Minimum4
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.0 KiB
2022-10-19T03:50:57.285251image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile4.3
Q15.5
median7
Q38.5
95-th percentile9.7
Maximum10
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.717646897
Coefficient of variation (CV)0.2463394872
Kurtosis-1.151294482
Mean6.972681954
Median Absolute Deviation (MAD)1.5
Skewness0.009592348981
Sum6993.6
Variance2.950310864
MonotonicityNot monotonic
2022-10-19T03:50:57.405653image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
626
 
2.6%
6.625
 
2.5%
4.222
 
2.2%
9.522
 
2.2%
6.521
 
2.1%
521
 
2.1%
6.221
 
2.1%
821
 
2.1%
5.121
 
2.1%
7.620
 
2.0%
Other values (51)783
78.1%
ValueCountFrequency (%)
411
1.1%
4.117
1.7%
4.222
2.2%
4.318
1.8%
4.417
1.7%
4.517
1.7%
4.68
 
0.8%
4.712
1.2%
4.813
1.3%
4.918
1.8%
ValueCountFrequency (%)
105
 
0.5%
9.916
1.6%
9.819
1.9%
9.714
1.4%
9.617
1.7%
9.522
2.2%
9.412
1.2%
9.316
1.6%
9.216
1.6%
9.114
1.4%

Interactions

2022-10-19T03:50:53.156804image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:49.235337image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:49.900292image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:50.535494image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:51.226373image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:51.838611image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:52.450692image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:53.250944image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:49.345190image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:49.994544image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:50.708338image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:51.320593image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:51.932797image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:52.544831image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:53.345199image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:49.423842image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:50.072656image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:50.802544image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:51.399178image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:52.027015image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:52.639012image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:53.423786image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:49.517522image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:50.166918image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:50.880689image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:51.493423image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:52.105577image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:52.717585image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:53.517880image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:49.617615image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:50.261290image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:50.959226image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:51.587156image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:52.192700image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:52.905805image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:53.611615image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:49.711886image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:50.339983image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:51.053515image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:51.665773image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:52.277855image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:52.994486image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:53.696261image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:49.806116image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:50.434193image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:51.132145image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:51.744391image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:52.356472image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-19T03:50:53.078144image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-10-19T03:50:57.489354image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-19T03:50:57.630875image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-19T03:50:57.771986image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-19T03:50:57.907038image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-10-19T03:50:58.038530image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-19T03:50:53.846934image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-19T03:50:54.083074image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-10-19T03:50:54.209005image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-10-19T03:50:54.303245image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Invoice IDBranchCityCustomer typeGenderProduct lineUnit priceQuantityTax 5%TotalDateTimePaymentcogsgross margin percentagegross incomeRating
0750-67-8428AYangonMemberFemaleHealth and beauty74.697.026.1415548.97151/5/1913:08Ewallet522.834.76190526.14159.1
1226-31-3081CNaypyitawNormalFemaleElectronic accessories15.285.03.820080.22003/8/1910:29Cash76.404.7619053.82009.6
2631-41-3108AYangonNormalMaleHome and lifestyle46.337.016.2155340.52553/3/1913:23Credit card324.314.76190516.21557.4
3123-19-1176AYangonMemberMaleHealth and beauty58.228.023.2880489.04801/27/1920:33Ewallet465.764.76190523.28808.4
4373-73-7910AYangonNormalMaleSports and travel86.317.030.2085634.37852/8/1910:37Ewallet604.174.76190530.20855.3
5699-14-3026CNaypyitawNormalMaleElectronic accessories85.397.029.8865627.61653/25/1918:30Ewallet597.734.76190529.88654.1
6355-53-5943AYangonMemberFemaleNaN68.846.020.6520433.69202/25/1914:36Ewallet413.044.76190520.65205.8
7315-22-5665CNaypyitawNormalFemaleNaN73.5610.036.7800772.38002/24/1911:38Ewallet735.604.76190536.78008.0
8665-32-9167AYangonMemberFemaleNaN36.262.03.626076.14601/10/1917:15Credit card72.524.7619053.62607.2
9692-92-5582BMandalayMemberFemaleNaN54.843.08.2260172.74602/20/1913:27Credit card164.524.7619058.22605.9

Last rows

Invoice IDBranchCityCustomer typeGenderProduct lineUnit priceQuantityTax 5%TotalDateTimePaymentcogsgross margin percentagegross incomeRating
993690-01-6631BMandalayNormalMaleFashion accessoriesNaN10.08.7450183.64502/22/1918:35Ewallet174.904.7619058.74506.6
994652-49-6720CNaypyitawMemberFemaleElectronic accessoriesNaN1.03.047563.99752/18/1911:40Ewallet60.954.7619053.04755.9
995233-67-5758CNaypyitawNormalMaleHealth and beautyNaN1.02.017542.36751/29/1913:46Ewallet40.354.7619052.01756.2
996303-96-2227BMandalayNormalFemaleHome and lifestyleNaN10.048.69001022.49003/2/1917:16Ewallet973.804.76190548.69004.4
997727-02-1313AYangonMemberMaleFood and beveragesNaN1.01.592033.43202/9/1913:22Cash31.844.7619051.59207.7
998347-56-2442AYangonNormalMaleHome and lifestyle65.821.03.291069.11102/22/1915:33Cash65.824.7619053.29104.1
999849-09-3807AYangonMemberFemaleFashion accessories88.347.030.9190649.29902/18/1913:28Cash618.384.76190530.91906.6
1000849-09-3807AYangonMemberFemaleFashion accessories88.347.030.9190649.29902/18/1913:28Cash618.384.76190530.91906.6
1001745-74-0715AYangonNormalMaleElectronic accessoriesNaN2.05.8030121.86303/10/1920:46Ewallet116.064.7619055.80308.8
1002452-04-8808BMandalayNormalMaleElectronic accessories87.08NaN30.4780640.03801/26/1915:17Cash609.564.76190530.47805.5

Duplicate rows

Most frequently occurring

Invoice IDBranchCityCustomer typeGenderProduct lineUnit priceQuantityTax 5%TotalDateTimePaymentcogsgross margin percentagegross incomeRating# duplicates
0452-04-8808BMandalayNormalMaleElectronic accessories87.08NaN30.478640.0381/26/1915:17Cash609.564.76190530.4785.52
1745-74-0715AYangonNormalMaleElectronic accessoriesNaN2.05.803121.8633/10/1920:46Ewallet116.064.7619055.8038.82
2849-09-3807AYangonMemberFemaleFashion accessories88.347.030.919649.2992/18/1913:28Cash618.384.76190530.9196.62